Lesson 6

01. Welcome

Notes:

getwd()
## [1] "/Users/webstrum/Study/old/udacity/docs/rcode/P07M01L09"
setwd("/Users/webstrum/Study/old/udacity/docs/rcode/P07M01L09")

02. Linear Regression Models

In this lesson, Solomon will be using a linear regression model to predict diamond price using other variables in the diamonds dataset.

03. Scatterplot Review

# Let's start by examining two variables in the data set.
# The scatterplot is a powerful tool to help you understand
# the relationship between two continuous variables.

# We can quickly see if the relationship is linear or not.
# In this case, we can use a variety of diamond
# characteristics to help us figure out whether
# the price advertised for any given diamond is 
# reasonable or a rip-off.

# Let's consider the price of a diamond and it's carat weight.
# Create a scatterplot of price (y) vs carat weight (x).

# Limit the x-axis and y-axis to omit the top 1% of values.
library("ggplot2")
data("diamonds")

qplot(data = diamonds, x = carat, y = price, 
      xlim =  c(0, quantile(diamonds$carat, 0.99)),
      ylim = c(0, quantile(diamonds$price, 0.99))) +
  geom_point(fill = "orange", color = "black", shape = 21)
## Warning: Removed 926 rows containing missing values (geom_point).

## Warning: Removed 926 rows containing missing values (geom_point).

ggplot(data = diamonds, aes(x = carat, y = price)) +
  geom_point(fill = "blue", color = "black", shape = 21) +
  coord_cartesian(xlim = c(0, quantile(diamonds$carat, 0.99)), ylim = c(0, quantile(diamonds$price, 0.99)))


04. Price and Carat Relationship

Response: We can see a nonlinear relationship. Maybe it’s exponential or maybe it’s something else. We can see that the dispersion or variance of the relationship also increases as carat size increases.

ggplot(data = diamonds, aes(x = carat, y = price)) +
  geom_point(fill = "blue", color = "black", shape = 21) +
  coord_cartesian(xlim = c(0, quantile(diamonds$carat, 0.99)), ylim = c(0, quantile(diamonds$price, 0.99))) +
  stat_smooth(method = "lm")


05. Frances Gerety

Notes:

Frances Gerety coined a famous slogan complete that slogan. > A diamonds is forever.


06. The Rise of Diamonds

Notes:

The Counte of Monte Cristo and Knocked Up are two movies that have proposals without engagement rings.


07. ggpairs Function

Notes:

# install these if necessary
# install.packages('GGally')
# install.packages('scales')
# install.packages('memisc')
# install.packages('lattice')
# install.packages('MASS')
# install.packages('car')
# install.packages('reshape')
# install.packages('plyr')

# load the ggplot graphics package and the others
library(ggplot2)
library(GGally)
## Registered S3 method overwritten by 'GGally':
##   method from   
##   +.gg   ggplot2
library(lattice)
library(MASS)
library(scales)
library(memisc)
## 
## Attaching package: 'memisc'
## The following object is masked from 'package:scales':
## 
##     percent
## The following object is masked from 'package:ggplot2':
## 
##     syms
## The following objects are masked from 'package:stats':
## 
##     contr.sum, contr.treatment, contrasts
## The following object is masked from 'package:base':
## 
##     as.array
# sample 10,000 diamonds from the data set
set.seed(20022012)
diamond_samp <- diamonds[sample(1:length(diamonds$price), 10000), ]
ggpairs(diamond_samp, progress = FALSE,
  lower = list(continuous = wrap("points", shape = I('.'))),
  upper = list(combo = wrap("box", outlier.shape = I('.'))))
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

What are some things you notice in the ggpairs output?

Response:

We can see what might be relationships between price and clarity and price and color, which we’ll keep in mind for later when start modelling our data.


08. The Demand of Diamonds

Notes:

# Create two histograms of the price variable
# and place them side by side on one output image.

# We’ve put some code below to get you started.

# The first plot should be a histogram of price
# and the second plot should transform
# the price variable using log10.

# Set appropriate bin widths for each plot.
# ggtitle() will add a title to each histogram.

# You can self-assess your work with the plots
# in the solution video.

library(gridExtra)

plot1 <- qplot(data = diamond_samp, x = price, binwidth = 100, fill = I("#099DD9")) + 
  ggtitle('Price')

plot2 <- qplot(data = diamond_samp, x = price, binwidth = 0.01, fill = I("#F79420")) +
  ggtitle('Price (log10)') +
  scale_x_log10()

grid.arrange(plot1, plot2, ncol = 2)


09. Connecting Demand and Price Distributions

Notes: Indeed we can see that the prices for diamonds are pretty heavily skewed, but when you put those prices on a log ten scale, they seem much better behaved. They’re much closer to the bell curve of a normal distribution. We can even see a little bit of evidence of bimodality on this log ten scale. Which is consistent with our two class rich buyer poor buyer speculation about the nature of customers for diamonds. You actually saw a sneak peek of this in problem set set three when you looked at the log ten of price by cut.


10. Scatterplot Transformation

qplot(carat, price, data = diamonds) +
  scale_y_continuous(trans = log10_trans()) +
  ggtitle("Price (log10) by Carat")

  • Create a new function to transform the carat variable
cuberoot_trans = function() trans_new('cuberoot', transform = function(x) x^(1/3),
                                      inverse = function(x) x^3)
  • Use the cuberoot_trans function
ggplot(aes(carat, price), data = diamonds) + 
  geom_point() + 
  scale_x_continuous(trans = cuberoot_trans(), limits = c(0.2, 3),
                     breaks = c(0.2, 0.5, 1, 2, 3)) + 
  scale_y_continuous(trans = log10_trans(), limits = c(350, 15000),
                     breaks = c(350, 1000, 5000, 10000, 15000)) +
  ggtitle('Price (log10) by Cube-Root of Carat')
## Warning: Removed 1683 rows containing missing values (geom_point).


11. Overplotting Revisited

head(sort(table(diamonds$carat), decreasing = T))
## 
##  0.3 0.31 1.01  0.7 0.32    1 
## 2604 2249 2242 1981 1840 1558
head(sort(table(diamonds$price), decreasing = T))
## 
## 605 802 625 828 776 698 
## 132 127 126 125 124 121
# Add a layer to adjust the features of the
# scatterplot. Set the transparency to one half,
# the size to three-fourths, and jitter the points.

# If you need hints, see the Instructor Notes.
# There are three hints so scroll down slowly if
# you don’t want all the hints at once.

ggplot(aes(carat, price), data = diamonds) + 
  geom_point(alpha = 0.5, size = 0.75, position = "jitter") + 
  scale_x_continuous(trans = cuberoot_trans(), limits = c(0.2, 3),
                     breaks = c(0.2, 0.5, 1, 2, 3)) + 
  scale_y_continuous(trans = log10_trans(), limits = c(350, 15000),
                     breaks = c(350, 1000, 5000, 10000, 15000)) +
  ggtitle('Price (log10) by Cube-Root of Carat')
## Warning: Removed 1691 rows containing missing values (geom_point).


12. Plot Colors for Qualitative Factors

Notes

No single diamond is perfect for everyone—but all of our customers, whether they’re eyeing a .50-carat or a 16-carat diamond, want as much sparkle as their budget allows. Of the the 4Cs (cut, color, clarity, carat), cut has the greatest influence on a diamond’s beauty and sparkle. Even a diamond with a flawless clarity grade (no blemishes or inclusions) can look glassy or dull if the cut is too shallow or deep. So, when determining what diamond to buy, go with the best cut grade that you can afford.


13. Price vs. Carat and Clarity

Alter the code below.

# install and load the RColorBrewer package
# install.packages('RColorBrewer')
library(RColorBrewer)

# Adjust the code below to color the points by clarity.

# A layer called scale_color_brewer() has 
# been added to adjust the legend and
# provide custom colors.

# See if you can figure out what it does.
# Links to resources are in the Instructor Notes.

# You will need to install the package RColorBrewer
# in R to get the same colors and color palettes.

ggplot(aes(x = carat, y = price), data = diamonds) + 
  geom_point(aes(color = clarity), alpha = 0.5, size = 1, position = 'jitter') +
  scale_color_brewer(type = 'div',
    guide = guide_legend(title = 'Clarity', reverse = T,
    override.aes = list(alpha = 1, size = 2))) +  
  scale_x_continuous(trans = cuberoot_trans(), limits = c(0.2, 3),
    breaks = c(0.2, 0.5, 1, 2, 3)) + 
  scale_y_continuous(trans = log10_trans(), limits = c(350, 15000),
    breaks = c(350, 1000, 5000, 10000, 15000)) +
  ggtitle('Price (log10) by Cube-Root of Carat and Clarity')
## Warning: Removed 1694 rows containing missing values (geom_point).


14. Clarity and Price

Response:

Holding carat weight constant, we’re looking at one part of the plot. We see the diamonds with lower clarity are almost always cheaper than diamonds with better clarity.


15. Price vs. Carat and Cut

Alter the code below.

# Let’s look at cut and see if we find a similar result.

# Adjust the code below to color the points by cut.
# Change any other parts of the code as needed.

ggplot(aes(x = carat, y = price), data = diamonds) + 
  geom_point(aes(color = cut), alpha = 0.5, size = 1, position = 'jitter') +
  scale_color_brewer(type = 'div',
                     guide = guide_legend(title = 'Cut', reverse = T,
                                          override.aes = list(alpha = 1, size = 2))) +  
  scale_x_continuous(trans = cuberoot_trans(), limits = c(0.2, 3),
                     breaks = c(0.2, 0.5, 1, 2, 3)) + 
  scale_y_continuous(trans = log10_trans(), limits = c(350, 15000),
                     breaks = c(350, 1000, 5000, 10000, 15000)) +
  ggtitle('Price (log10) by Cube-Root of Carat and Cut')
## Warning: Removed 1690 rows containing missing values (geom_point).


16. Cut and Price

Response:

Despite what Blue Nile says, we don’t see much variation on cut. Most of the diamonds in the data are ideal cut anyway, so we’ve lost the color pattern that we saw before.


17. Price vs. Carat and Color

Alter the code below.

# Finally, let’s use diamond color to color our plot.

# Adjust the code below to color the points by diamond colors
# and change the titles.

ggplot(aes(x = carat, y = price), data = diamonds) + 
  geom_point(aes(color = color), alpha = 0.5, size = 1, position = 'jitter') +
  scale_color_brewer(type = 'div',
                     guide = guide_legend(title = "Color", reverse = F,
                                          override.aes = list(alpha = 1, size = 2))) +  
  scale_x_continuous(trans = cuberoot_trans(), limits = c(0.2, 3),
                     breaks = c(0.2, 0.5, 1, 2, 3)) + 
  scale_y_continuous(trans = log10_trans(), limits = c(350, 15000),
                     breaks = c(350, 1000, 5000, 10000, 15000)) +
  ggtitle('Price (log10) by Cube-Root of Carat and Color')
## Warning: Removed 1690 rows containing missing values (geom_point).


18. Color and Price

Response:

Color does seem to explain some of the variance in price. Just like we saw with the clarity variable. Blue now however, states that the difference between all color grades from D to J are basically not noticeable to the naked eye. Yet, we do see the color difference in the price tag.


19. Linear Models in R

Notes:

Use the lm() function lm(y ~ x)

Response:

Remember we applied the log transformation to our long tailed dollar variable, and we speculated that the flawless diamond should become exponentially rarer as diamond volume increases.

log(price) ~ carat^(1/3)


20. Building the Linear Model

Notes:

m1 <- lm(I(log(price)) ~ I(carat^(1/3)), data = diamonds)
m2 <- update(m1, ~ . + carat)
m3 <- update(m2, ~ . + cut)
m4 <- update(m3, ~ . + color)
m5 <- update(m4, ~ . + clarity)
mtable(m1, m2, m3, m4, m5, sdigits = 3, summary.stats=c("sigma","R-squared","F","p", "log-likelihood","Deviance", "N"))
## 
## Calls:
## m1: lm(formula = I(log(price)) ~ I(carat^(1/3)), data = diamonds)
## m2: lm(formula = I(log(price)) ~ I(carat^(1/3)) + carat, data = diamonds)
## m3: lm(formula = I(log(price)) ~ I(carat^(1/3)) + carat + cut, data = diamonds)
## m4: lm(formula = I(log(price)) ~ I(carat^(1/3)) + carat + cut + color, 
##     data = diamonds)
## m5: lm(formula = I(log(price)) ~ I(carat^(1/3)) + carat + cut + color + 
##     clarity, data = diamonds)
## 
## ============================================================================================
##                        m1             m2             m3             m4            m5        
## --------------------------------------------------------------------------------------------
##   (Intercept)          2.821***       1.039***       0.874***      0.932***       0.415***  
##                       (0.006)        (0.019)        (0.019)       (0.017)        (0.010)    
##   I(carat^(1/3))       5.558***       8.568***       8.703***      8.438***       9.144***  
##                       (0.007)        (0.032)        (0.031)       (0.028)        (0.016)    
##   carat                              -1.137***      -1.163***     -0.992***      -1.093***  
##                                      (0.012)        (0.011)       (0.010)        (0.006)    
##   cut: .L                                            0.224***      0.224***       0.120***  
##                                                     (0.004)       (0.004)        (0.002)    
##   cut: .Q                                           -0.062***     -0.062***      -0.031***  
##                                                     (0.004)       (0.003)        (0.002)    
##   cut: .C                                            0.051***      0.052***       0.014***  
##                                                     (0.003)       (0.003)        (0.002)    
##   cut: ^4                                            0.018***      0.018***      -0.002     
##                                                     (0.003)       (0.002)        (0.001)    
##   color: .L                                                       -0.373***      -0.441***  
##                                                                   (0.003)        (0.002)    
##   color: .Q                                                       -0.129***      -0.093***  
##                                                                   (0.003)        (0.002)    
##   color: .C                                                        0.001         -0.013***  
##                                                                   (0.003)        (0.002)    
##   color: ^4                                                        0.029***       0.012***  
##                                                                   (0.003)        (0.002)    
##   color: ^5                                                       -0.016***      -0.003*    
##                                                                   (0.003)        (0.001)    
##   color: ^6                                                       -0.023***       0.001     
##                                                                   (0.002)        (0.001)    
##   clarity: .L                                                                     0.907***  
##                                                                                  (0.003)    
##   clarity: .Q                                                                    -0.240***  
##                                                                                  (0.003)    
##   clarity: .C                                                                     0.131***  
##                                                                                  (0.003)    
##   clarity: ^4                                                                    -0.063***  
##                                                                                  (0.002)    
##   clarity: ^5                                                                     0.026***  
##                                                                                  (0.002)    
##   clarity: ^6                                                                    -0.002     
##                                                                                  (0.002)    
##   clarity: ^7                                                                     0.032***  
##                                                                                  (0.001)    
## --------------------------------------------------------------------------------------------
##   sigma                0.280          0.259          0.250         0.224          0.129     
##   R-squared            0.924          0.935          0.939         0.951          0.984     
##   F               652012.063     387489.366     138654.523     87959.467     173791.084     
##   p                    0.000          0.000          0.000         0.000          0.000     
##   Deviance          4242.831       3613.360       3380.837      2699.212        892.214     
##   N                53940          53940          53940         53940          53940         
## ============================================================================================
##   Significance: *** = p < 0.001; ** = p < 0.01; * = p < 0.05

Notice how adding cut to our model does not help explain much of the variance in the price of diamonds. This fits with out exploration earlier.


21. Model Problems

Video Notes:

Our model is lm(price) = 0.415 + 9.144 * carat^(1/3) - 1.093 * carat + (. * cut + . * color + . * clarity) + error

Research:

(Take some time to come up with 2-4 problems for the model) (You should 10-20 min on this)

Response:

2008->2014 * inflation * 2008 global recession * Diamond market in China growing * uneven recovery price increase across different carat weight


22. A Bigger, Better Data Set

Notes:

#install.package('bitops')
#install.packages('RCurl')
library('bitops')
library('RCurl')

#diamondsurl = getBinaryURL("https://raw.github.com/solomonm/diamonds-data/master/BigDiamonds.Rda")
#load(rawConnection(diamondsurl))
load("BigDiamonds.Rda")

The code used to obtain the data is available here: https://github.com/solomonm/diamonds-data

  • Building a Model Using the Big Diamonds Data Set

Notes:

# Your task is to build five linear models like Solomon
# did for the diamonds data set only this
# time you'll use a sample of diamonds from the
# diamondsbig data set.

# Be sure to make use of the same variables
# (logprice, carat, etc.) and model
# names (m1, m2, m3, m4, m5).

# To get the diamondsbig data into RStudio
# on your machine, copy, paste, and run the
# code in the Instructor Notes. There's
# 598,024 diamonds in this data set!

# Since the data set is so large,
# you are going to use a sample of the
# data set to compute the models. You can use
# the entire data set on your machine which
# will produce slightly different coefficients
# and statistics for the models.

# You can leave off the code to load in the data.
# We've sampled the data for you.
# You also don't need code to create the table output of the models.
# We'll do that for you and check your model summaries (R^2 values, AIC, etc.)

# Your task is to write the code to create the models.

# DO NOT ALTER THE CODE BELOW THIS LINE (Reads in a sample of the diamondsbig data set)
# diamondsBigSample <- read.csv('diamondsBigSample.csv')
# sample 10,000 diamonds from the data set
# set.seed(20022012)
# diamondsbig_samp <- diamondsbig[sample(1:length(diamondsbig$price), 250000), ]

# ENTER YOUR CODE BELOW THIS LINE. (Create the five models)
diamondsbig$logprice <- log(diamondsbig$price)
m1 <- lm(logprice ~ I(carat^(1/3)), 
         data = diamondsbig[diamondsbig$price < 10000 &
                              diamondsbig$cert == "GIA", ])
m2 <- update(m1, ~ . + carat)
m3 <- update(m2, ~ . + cut)
m4 <- update(m3, ~ . + color)
m5 <- update(m4, ~ . + clarity)
mtable(m1, m2, m3, m4, m5, summary.stats=c("sigma","R-squared","F","p", "log-likelihood","Deviance", "N"))
## 
## Calls:
## m1: lm(formula = logprice ~ I(carat^(1/3)), data = diamondsbig[diamondsbig$price < 
##     10000 & diamondsbig$cert == "GIA", ])
## m2: lm(formula = logprice ~ I(carat^(1/3)) + carat, data = diamondsbig[diamondsbig$price < 
##     10000 & diamondsbig$cert == "GIA", ])
## m3: lm(formula = logprice ~ I(carat^(1/3)) + carat + cut, data = diamondsbig[diamondsbig$price < 
##     10000 & diamondsbig$cert == "GIA", ])
## m4: lm(formula = logprice ~ I(carat^(1/3)) + carat + cut + color, 
##     data = diamondsbig[diamondsbig$price < 10000 & diamondsbig$cert == 
##         "GIA", ])
## m5: lm(formula = logprice ~ I(carat^(1/3)) + carat + cut + color + 
##     clarity, data = diamondsbig[diamondsbig$price < 10000 & diamondsbig$cert == 
##     "GIA", ])
## 
## ===============================================================================================
##                         m1              m2             m3             m4             m5        
## -----------------------------------------------------------------------------------------------
##   (Intercept)           2.671***        1.333***       0.949***       0.529***      -0.464***  
##                        (0.003)         (0.012)        (0.012)        (0.010)        (0.009)    
##   I(carat^(1/3))        5.839***        8.243***       8.633***       8.110***       8.320***  
##                        (0.004)         (0.022)        (0.021)        (0.017)        (0.012)    
##   carat                                -1.061***      -1.223***      -0.782***      -0.763***  
##                                        (0.009)        (0.009)        (0.007)        (0.005)    
##   cut: V.Good                                          0.120***       0.090***       0.071***  
##                                                       (0.002)        (0.001)        (0.001)    
##   cut: Ideal                                           0.211***       0.181***       0.131***  
##                                                       (0.002)        (0.001)        (0.001)    
##   color: K/L                                                          0.123***       0.117***  
##                                                                      (0.004)        (0.003)    
##   color: J/L                                                          0.312***       0.318***  
##                                                                      (0.003)        (0.002)    
##   color: I/L                                                          0.451***       0.469***  
##                                                                      (0.003)        (0.002)    
##   color: H/L                                                          0.569***       0.602***  
##                                                                      (0.003)        (0.002)    
##   color: G/L                                                          0.633***       0.665***  
##                                                                      (0.003)        (0.002)    
##   color: F/L                                                          0.687***       0.723***  
##                                                                      (0.003)        (0.002)    
##   color: E/L                                                          0.729***       0.756***  
##                                                                      (0.003)        (0.002)    
##   color: D/L                                                          0.812***       0.827***  
##                                                                      (0.003)        (0.002)    
##   clarity: I1                                                                        0.301***  
##                                                                                     (0.006)    
##   clarity: SI2                                                                       0.607***  
##                                                                                     (0.006)    
##   clarity: SI1                                                                       0.727***  
##                                                                                     (0.006)    
##   clarity: VS2                                                                       0.836***  
##                                                                                     (0.006)    
##   clarity: VS1                                                                       0.891***  
##                                                                                     (0.006)    
##   clarity: VVS2                                                                      0.935***  
##                                                                                     (0.006)    
##   clarity: VVS1                                                                      0.995***  
##                                                                                     (0.006)    
##   clarity: IF                                                                        1.052***  
##                                                                                     (0.006)    
## -----------------------------------------------------------------------------------------------
##   sigma                 0.289           0.284          0.275          0.216          0.154     
##   R-squared             0.888           0.892          0.899          0.937          0.969     
##   F               2700903.714     1406538.330     754405.425     423311.488     521161.443     
##   p                     0.000           0.000          0.000          0.000          0.000     
##   Deviance          28298.689       27291.534      25628.285      15874.910       7992.720     
##   N                338946          338946         338946         338946         338946         
## ===============================================================================================
##   Significance: *** = p < 0.001; ** = p < 0.01; * = p < 0.05
# DO NOT ALTER THE CODE BELOW THIS LINE (Tables your models and pulls out the statistics)
suppressMessages(library(lattice))
suppressMessages(library(MASS))
suppressMessages(library(memisc))
models <- mtable(m1, m2, m3, m4, m5)

23. Predictions

Example Diamond from BlueNile: Round 1.00 Very Good I VS1 $5,601

#Be sure you’ve loaded the library memisc and have m5 saved as an object in your workspace.
thisDiamond = data.frame(carat = 1.00, cut = "V.Good",
                         color = "I", clarity="VS1")
modelEstimate = predict(m5, newdata = thisDiamond,
                        interval="prediction", level = .95)
exp(modelEstimate)
##        fit     lwr      upr
## 1 5040.436 3730.34 6810.638

Evaluate how well the model predicts the BlueNile diamond’s price. Think about the fitted point estimate as well as the 95% CI.

dat = data.frame(m4$model, m4$residuals)

with(dat, sd(m4.residuals))
## [1] 0.2164168
with(subset(dat, carat > .9 & carat < 1.1), sd(m4.residuals))
## [1] 0.2178147
dat$resid <- as.numeric(dat$m4.residuals)
ggplot(aes(y = resid, x = round(carat, 2)), data = dat) +
  geom_line(stat = "summary", fun.y = sd)
## Warning: Removed 1 rows containing missing values (geom_path).


24. Final Thoughts

Notes:

Data and models are never infallible and you can still get taken even equipped with this kind of analysis. There’s no substitute for establishing a personal connection and lasting business relationship with a jeweler you can trust.

25. Congratulations and Next Steps

We hope you feel prepared to explore data on your own, and answer questions that are interesting to you.


Click KnitHTML to see all of your hard work and to have an html page of this lesson, your answers, and your notes!